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This  interim  report  summarizes  the  effort  expended  from  July  1,197  6 
to  December  31,  1976  on  RADC  Contract  No.  F-30602-74-C-0294.  The 
major  topics  investigated  and  participating  personnel  during  this  period  are 
listed  in  Section  II.  Summaries  of  these  topics  appear  in  Section  III.  Sec- 
tion IV  indicates  the  planned  direction  of  the  research  for  the  immediate 
future.  Section  V reports  on  the  professional  activities  of  the  staff  during 
the  reporting  period. 


SECTION  II 

PERSONNEL  AND  WORK  AREAS 

The  following  personnel  participated  in  the  research  activities  during 
this  reporting  period 


M.  Shooman 
H.  Ruston 


D. 

Baggi 

C. 

Marshall 

E. 

Berlinger 

S. 

Natarajan 

A. 

Laemmel 

G. 

Popkin 

E. 

Lipshitz 

B. 

Rudner 

and  worked  in  the  following  areas: 
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1 .  Shopman  and  Nataranjan: 


n 
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The  macro  model  previously  developed  for  the  estimation  of  the 
number  of  errors  has  been  extended.  The  extension  involved  the  modeling 
of  errors  generated  during  debugging,  and  has  been  described  in  a technical 
report  released  in  August  197  6. 

2.  Shopman; 

A new  approach  to  the  estimation  of  software  reliability  was  initiated 
during  this  period.  The  approach  centers  on  a micro  reliability  model. 

Such  a model  incorporates  representative  features  of  the  internal  program 
structure.  Specific  parameters  of  the  model  are  the  path  (module)  trav- 
ersal frequencies  and  times,  with  the  path  failure  probabilities. 

3.  Ruston  and  Berlinger; 

Work  has  begun  on  extensions  of  software  physics  formulas  to  define 
a measure  of  complexity.  The  automation  of  gathering  of  statistics  has 
been  initiated  for  the  validation  of  the  extended  formulas. 

4.  B.  Rudner ; 


The  seeding /tagging  estimate  formulas  have  been  developed  and  are 
described  in  a technical  report  (released  in  November  197  6).  Several 
plans  for  a small  scale  experimentation  to  obtain  experience  with  the  method 
are  being  considered. 

5.  Ruston  and  Shopman; 

Planning  and  execution  of  small  scale  tests  for  gathering  of  para- 
meters for;  the  seeding /tagging  estimates,  the  micro-models,  and  the 
extended  software  physics  formulas. 

6.  Lipshitz  and  Shopman; 

Continuation  of  work  on  automatic  and  modular  techniques  for  the 
construction  of  low-cost,  low-error  content  application  programs. 

7.  Shopman  and  Laemmel: 


A new  approach  to  measurement  of  program  length  and  complexity 
has  been  undertaken.  This  approach  is  based  upon  the  application  of  statis- 
tical natural  language  theory.  The  specific  theory  exploited  is  the  work  by 
Zipf  on  word  probabilities(in  the  1930's),  whichhas  been  shown  to  apply  to  pro- 
gram operands  and  operators.  The  results  are  presently  being  compared 
with  analogous  results  achieved  through  the  use  of  software  physics  for- 
mulas. 
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8.  Popkin  and  Shopman: 

Work  is  continuing  on  algorithms  for  the  enumeration  of  feasible  pro- 
gram test  path. 

9.  Baggi  and  Shopman: 

Continuation  of  the  effort  on  driver  programs.  The  driver  programs 
developed  so  far  are  limited  to  programs  with  no  loops.  This  work  is  being 
generalized  to  apply  to  all  loopless  programs.  Attention  is  focused  on 
eliminating  this  loopless  restriction. 

SECTION  III 

SUMMARY  OF  PROGRESS 


In  this  section  we  describe  briefly  the  work  performed.  Upon  the 
completion  of  each  task,  a complete  technical  report  will  be  issued. 

Several  technical  reports  which  document  either  the  completed  or  continued 
research  are  in  various  stages  of  preparation. 


3.  1 Upper  Bounds  on  the  Number  of  Tests  Needed  to 
Verify  a Computer  Program 

by  Gary  S.  Popkin 


3.  1.  1 Introduction 


In  an  earlier  report  [1]  the  upper  and  lower  bounds  on  the  minimum 
number  of  program  test  cases  were  discussed,  with  application  to  flow- 
charts containing  two- way  decisions.  The  conditions  for  reaching  the  lower 
and  upper  bounds  were  discussed,  and  examples  given.  In  this  work,  these 
ideas  are  extended  to  flowcharts  containing  three-way  (e.  g.  , A < B,  A = B, 
A>B)  and  multi-way  decisions. 


3.  1.  2 Upper  and  lower  bounds  on  the  number  of  tests  needed  to  verify  a 
program 


t ■ 

■ 


Each  of  the  two  flowcharts  in  Figure  1 contains  four  three-way  decisions. 
The  numbering  of  the  segments,  with  two  segment  numbers  on  some  of  the 
flow  lines,  indicates  that  the  decisions  are  three-way.  In  Figure  1(a),  the 
methods  of  [ 2]  would  yield  a maximum  incomparable  set  size  (and  hence  a 
lower  bound  on  the  minimum  number  of  tests)  of  3.  It  will  be  shown  below 
how  the  upper  bound  may  be  computed,  and  how  the  flowchart  contents  can 
be  inserted  to  raise  the  minimum  number  of  tests  required  to  approach  the 
upper  bound. 

In  Figure  1(b),  the  methods  of  [ 2]  yield  9 as  the  size  of  the  maximum 
incomparable  set,  and  the  lower  bound  on  the  minimum  number  of  tests. 

9 is  also  the  upper  bound,  for  no  flowchart  contents  can  raise  the  minimum 
number  of  required  tests  above  9. 

Minimum  number  of  tests  for  charts  with  three-way  deciders 

In  a loopless  flowchart  with  three-way  decisions,  the  upper  bound  on 
the  minimum  number  of  tests  needed  to  pass  through  each  segment  at  least 
once  is  given  by 


u = 2d  + 1 

where  d is  the  number  of  deciders  in  the  flowchart. 

Proof:  Consider  a flowchart  with  no  deciders.  Such  a flowchart  con- 
sists of  one  segment  and  requires  one  test.  Each  three-way  decider  added 
to  the  flowchart  can  require  at  most  two  additional  tests,  so  u = 2d  + 1. 
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Minimum  number  of  tests  for  charts  with  multi-deciders 

In  a flowchart  where  the  deciders  may  have  any  different  numbers  of 
outcomes,  the  upper  bound  on  the  minimum  number  of  tests  needed  to  pass 
through  each  segment  at  least  once  is  given  by 


n 

u = V w.  - n+  l 
i=l  1 


where  w.  is  the  number  of  outcomes  of  decider  i , and  n is  the  number 
of  deciders  in  the  flowchart. 

Proof:  Consider  a flowchart  with  no  deciders.  It  consists  of  one 

segment  and  requires  one  test.  Each  decider  i can  require  at  most  w^-1 
additional  tests,  so 


u = y (w.  - 1)  + 1 

i = 1 1 

n 

= y w.  - n + 1 

Ai  1 

as  asserted. 

Figure  2 portrays  the  flowchart  of  Figure  1(a)  with  contents.  The 
minimum  number  of  tests  needed  to  pass  through  each  segment  at  least 
once  is  now  7.  The  computed  upper  bound  for  the  flowchart  is  9. 

In  the  deciders  in  Figure  2,  the  segment  numbers  have  the  following 
meanings: 


Segment  No. 


Outcome 


1 

2 

4 

5 

7 

8 
10 
1 1 


P = 8 
P > 8 
P = 5 
P > 5 
P = 2 
P > 2 
P = 0 
P < 0 


5 


Egga 


FIGURE  ? 


A Flowchart  with  Contents 


7 


If  the  input  variable  P may  take  on  the  seven  values  1,  2,4,  5,  7,  8, 
and  9,  then  the  flowchart  of  Figure  2 would  require  seven  tests  to  traverse 
each  segment  at  least  once. 

P Path  traversed 

1 3-6-9-11 

2 3-6-7-10 

4 3-6-8-12 

5 3-4-8-12 

7 3-5-8-12 

8 1-5-8-12 

9 2-5-8-12 

The  above  illustrates  the  calculation  of  the  upper  bounds  on  the  num- 
ber of  tests.  Work  is  continuing  on  obtaining  the  actual  number  of  tests 
rather  than  just  a pessimistic  upper  bound. 


1.  Gary  S.  Popkin,  "Program  Paths  and  the  Minimum  Number  of  Tests 
Needed  to  Verify  a Computer  Program,  " Summary  of  Technical 
Progress,  Software  Modeling  Studies,  July  1,197 5-December  31,197  5, 
Polytechnic  Institute  of  New  York 

2.  M.  Lipow,  "Application  of  Algebraic  Methods  to  Computer  Program 
Analysis,"  Report  TRW-55-73-  10,  TRW  Software  Series  May  1973. 


3.  2 Extensions  of  Software  Physics  to  Measures  of  Complexity 
by  E.  Berlinger  and  H.  Ruston 


3.  2.  1 Introduction 

In  his  work  on  software  physics,  M. Halstead^  introduced  a measure  of 
complexity  based  upon  certain  program  parameters.  With  the  parameters: 

Nj  = number  of  distinct  operators 

number  of  distinct  operands 
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a measure  is 

V = N j log^  N i + l°g  ^2 


i 

i 

i 


I 


' 
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which  he  calls  the  program  volume.  Additional  definitions  and  theorems 
are  also  given  or  conjectured.  These  relate  the  effort  in  programming, 
the  total  programming  time,  and  estimate  the  program  length. 

Mathematically,  the  formulas  are  empirical  and  the  proofs  heuristic. 
But  the  results  verify  well  with  experimental  data.  The  present  effort  at- 
tempts to  improve  on  the  Halstead  scheme. 


3.  2.  2 Outline  of  Current  Work 


One  of  the  principal  purposes  of  the  work  is  to  refine  the  definition  of 
program  volume  to  make  it  mathematically  more  sound,  and  also  to  include 
frequency  of  usage  of  the  various  program  constructs.  If  we  define: 

til 

f j = frequency  of  usage  of  the  i operator 

til 

p^  = probability  of  usage  of  the  i operator 
n = frequency  of  usage  of  the  variable  whose  rank  is  j 
Pj  = probability  of  usage  of  the  variable  whose  rank  is  j 

we  can  then  define  a measure  of  complexity  as 

-^f.  log-  p.  - Vf.  log.,  p.. 

Li  i e2  *1  Li  j 62  *J 

There  is  strong  justification  for  this  definition  from  an  information  theory 
point  of  view.  This  measure  should  correlate  well  with  the  number  of  bugs 
in  a program.  If  so,  then  the  number  of  bugs  can  be  predicted  from  an 
initial  version  of  the  program. 

Work  is  currently  focused  on  automating  the  process  for  gathering 
the  statistics  necessary  to  obtain  the  p.  and  y..  To  this  end,  the  operat- 
ing system  of  an  IBM  370/125  is  being  1 modified  to  copy  all  error-free 
FORTRAN  student  programs  onto  a tape.  Student  programs  collected  over 
a full  semester  will  then  yield  the  necessary  probabilities. 

Statistics  on  errors  will  also  be  collected  and  automatically  copied 
onto  a second  tape.  Specifically,  the  FORTRAN  error  numbers  will  be 
obtained  from  the  output  queue  before  printing.  These  will  give  the  syntax 
and  run-time  errors.  To  obtain  a count  of  logical  errors,  a count  of  the 
total  number  of  times  a program  is  run  is  being  kept.  It  is  assumed  that 
the  number  of  logical  errors  is  one  less  than  the  number  of  runs  which 
yield  no  syntax  or  run-time  errors.  This  may  be  an  underestimate  but 
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since  the  programs  being  used  are  of  first  year  programming  students,  it 
is  reasonable  to  assume  that  they  only  find  one  logical  error  at  a time. 

To  obtain  the  frequency  counts,  the  tape  containing  the  error-free 
runs  will  be  run  through  a program  supplied  by  Professor  Halstead  and  Mr. 
Ottenstein  of  Purdue,  which  analyzes  FORTRAN  programs  and  yields  the 
frequencies  automatically.  All  programs  on  both  tapes  will  be  sufficiently 
identified  so  that  the  bugs  can  be  correlated  with  the  frequency  counts. 

Obtaining  the  probabilities  p.  and  p.  is  a secondary  purpose  of  the 
project  and  will  supplement  some  statistics' obtained  previously  by  D.  Knuth. 


3.  2.  3 Tests  for  Obevance  of  Zipf's  law 


It  is  expected  that  the  probabilities  p.  and  p.  will  also  obey  Zipf's 
law,  either  in  its  pure  form  (i.  e.  , Pr  = “>  1 where  ^ p^  is  the  probability 

of  the  operator  or  operand  whose  rank  is  r),  or  in  one  of  its  modified  forms 

(e.  g.  , p = ).  If  the  frequencies  also  follow  a Zipf's  law,  it  may 

r (r  + a)n 

be  possible  to  get  a criterion  for  program  length.  This,  however,  remains 
to  be  seen. 


1.  M.  Halstead,  "Software  Physics,  Basic  Principles",  IBM  Research 
Report,  RJ1582  IBM  Research,  Yorktown  Heights,  N.  Y.  , May  1975. 


3.  3 Complexities  of  Natural  and  Computer  Languages 
by  M.  L.  Shooman  and  A.  Laemmel 

3.  3.  1 Introduction 

There  is  a great  need  for  theoretical  models  which  describe  programs 
and  allow  us  to  quantitatively  estimate  complexity,  running  time,  storage 
requirements,  and  development  time.  In  addition  to  serving  as  an  estimate 
during  the  initial  design  period,  they  can  be  refined  as  the  program  develops 
and  used  as  a management  and  analysis  tool.  They  Gan  also  be  used  to  com- 
pare initial  design  approaches,  programming  styles,  jdifferent  algorithms, 
etc.  . Early  work  on  such  a theory  has  been  initiated1.  This  work  discusses 
the  lingiustic  theory  (Zipf's  law's),  extends  these  to  programming  languages, 
and  develops  equations  for  program  length  based  on  these  principles.  This 
work  is  similar  to  that  of  Halstead  which  is  commonly  known  as  "Software 

Physics"  . 

There  are  many  similarities  between  natural  and  computer  languages, 
and  we  will  make  use  of  the  analogies  between  nouns  and  verbs  and  operands 
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and  operators.  The  similarities  in  structure  and  content  of  natural  and 
computer  language  are  further  illustrated  via  the  following  thought  problem. 
Suppose  we  take  a programmer  who  understands  a particular  computer 
language  and  give  him  the  complete  listing,  comments,  and  documentation 
for  a computer  program.  We  instruct  him  to  study  the  computer  program 
until  he  understands  it  and  then  produce  a report  written  in  good  English 
containing  paragraphs,  complete  sentences,  and  algorithms  written  as  a 
sequence  of  steps  in  good  English  without  mathematical  notation.  In  prin- 
ciple, the  report  and  the  computer  program  would  be  equivalent. 

3.  3.  2 Zipf's  law 

Before  we  discuss  Zipf's  law  it  is  convenient  to  introduce  a few  terms 
in  dealing  with  natural  language.  We  use  the  term  token  to  refer  to  all  the 
words  of  the  written  or  spoken  sample.  The  term  type  is  used  to  refer  to 
the  vocabulary  of  words  in  the  sample.  Much  of  our  efforts  will  be  centered 
on  the  counting  of  the  number  of  times,  n , particular  types  occur  in  a 
sample  of  n tokens  containing  t types.  The  most  frequently  occurring 
type  will  be  assigned  rank  r = 1 , the  second  most  frequent  type  rank  r = 2, 
and  the  least  frequent  type  rank  r = t.  Thus 

£ n =„  . (.) 

r = 1 

The  absolute  frequency  of  occurrence  for  type  r is  n ; however,  the  rel- 
ative frequency  of  occurrence  f is  simply  nf/n. 

Zipf  studied  the  relationship  between  relative  frequency  of  occurrence 
f and  rank  r for  words  from  English,  Chinese,  and  the  Latin  of  Platus^. 

Careful  study  of  Zipf's  data  and  that  of  others  shows  that  f vs.  r 
plots  as  a straight  line  on  log-log  paper,  with  a unity  slope,  thus  we  arrive 
at  the  simple  relationship  called  Zipf's  law, 

f • r = c (2a) 

r 

cn  . 

n = (2b) 

r r 

Inspection  of  Eq.  (2a)  yields  the  fact  that  the  constant  c can  be  inter- 
preted as  the  relative  frequency  of  the  rank  1 word  type  (also  the  intercept 
with  the  r = I line). 


3.3.3  Type  Token  Equation 


If  we  sum  both  sides  of  Eq.  (2b),  we  obtain  using  Eq.  (1) 
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t t , 

n=Y  n =cn  y — 

r = 1 r ~L'  r 


t 

k 


(3) 


The  summation  of  the  series  1/r  is  given  by 


1 


V 0.  5772  + In  t + 2fc  - 12t(t+1] 

r = l 


(4) 


Substitution  from  Eq.  (4)  into  Eq.  (3)(retaining  only  2 terms  for  modest  size 
t ) yields 


n ~ cn  (0.  577 2 + In  t) 


(5) 


We  can  eliminate  the  constant  c from  Eq.  (5)  by  considering  the  be- 
havior of  Eq.  (2b)  for  the  smallest  rank  which  is  where  r = t(e.g-  if 

max 

there  are  100  types,  then  the  largest  rank  is  obviously  100).  In  most  cases 

the  rarest  type  (largest  rank)  will  occur  only  once,  thus,  n = 1.  Sub- 

max 

stituting  these  values  yields,  c = t/n,  which  when  combined  with  Eq.  (5) 
gives 

n = t(0.  5772  + In  t)  (6) 


3.3.4  Summary  of  Experimental  Results 

The  results  to  date  have  shown  that  both  operators,  operands,  and  the 
sum  of  operators  plus  operands  rectify  fairly  well  with  a slope  of  unity  on 
log-log  paper  (i.  e.  they  fit  Zipf's  law)  for: 

a.  An  1 1 and  a 27  line  PL/1  program  (55  and  222  tokens) 

b.  The  MIKBUG  machine  language  executive  program  for  the 
M6800  microprocessor  (322  tokens) 

c.  Operators  in  PDP-11  assembly  language  programs  (1572 
tokens) 

d.  Variable  names  in  3 PE/1  programs  (320,  238,  and  193  to- 
kens) 


3.  3.  5 Relationship  to  "Software  Physics" 


The  initial  motivation  for  the  application  of  Zipf's  law  to  computer 
languages  came  from  a review  of  Halstead's^  work  on  Software  Physics. 
Early  in  his  work  he  arrives  at  a formula  for  program  length 

L = Nj  log2  Nj  + N2  log2  N2 

(7) 

where 

L = Program  length 

Nj=  Number  of  distinct  operator  types 
N2=  Number  of  distinct  operand  types 

In  terms  of  our  notation  the  analogous  quantities  are 

t=  n1  + n2 

(8) 

J 

ii 

c 

(9) 

Note  that  Eqs.  (7 ) and  (6)  are  of  similar  form.  In  Ref.  1 we  compare  the 
actual  number  of  tokens  (counted)  with  the  number  of  tokens  calculated 
using  both  Eqs.  (6)  and  (7).  The  average  error  and  average  magnitude  error 
are  computed,  and  both  equations  yield  good  agreement  (10-20%),  between 
actual  and  calculated  results. 


3.  3.  6 Estimation  of  Program  Length  Early  in  Design 

One  method  of  initially  estimating  program  length  (token  length*)  is 
to  estimate  the  number  of  tokens.  We  assume  the  analyst  initially  has  a 
complete  description  of  the  problem  and  that  a partial  analysis  and  choice  of 
key  algorithms  has  been  made.  An  elementary  approach  might  be  to  esti- 
mate the  token  size  by 

(1)  Estimating  the  number  of  operator  types  which  will  be  used  in 
the  language  by  the  assigned  programmers. 

(2)  Estimate  the  number  of  input  variables,  output  variables,  inter- 
mediate variables,  and  constants  need. 

(3)  Sum  the  estimates  of  step  (1)  and  (2)  and  substitute  in  Eq.  (6). 


In  addition  one  must  add  other  classes  of  statements  and  programming 
elements  such  as:  comments,  declares,  certain  assembler  directives, 
etc. 
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Clearly  we  might  consult  past  programs  written  by  the  assigned  pro- 
grammers for  data  on  number  of  operator  types  ; also,  if  a large  program 
will  be  stated  in  a comprehensive  specification  document,  written  in  English 
or  in  a specific  alien  language.  This  should  name  input  and  output 
variables;  thus  our  estimate  will  mainly  deal  with  predicting  the  number  of 
intermediate  variables  and  constants. 


1.  A.  Laemmel  and  M.  Shooman,  "Statistical  (Natural)  Language  Theory 
and  Computer  Programs",  Poly  EE/EP  Report,  January  1977. 

2.  M.  Halstead,  "Software  Physics,  Basic  Principles"  IBM  Research  Re- 
port, RJ1582.IBM  Research,  Yorktown  Heights , N.  Y . , May  1975. 

3.  George  K.  Zipf,  "The  Psycho-Biology  of  Language:  An  Introduction  to 
Dynamic  Philology"  M.I.  T.  Press,  1965.  (First  Houghton  Miffin 
Edition,  1935). 

4.  L.  Jolley,  "Summation  of  Series,  " p.  36,  n.  200,  and  p.  14,  n.  70,  Dover 
Publications,  New  York,  1961.  (Note  the  constant  0.  5772  is  called 
Eulers  constant,  see  "Differential  and  Integral  Calculus,"  R.  Courant, 
vol.  1,  Interscience  Publishers,  New  York,  1951,  p.  381,  420). 


3.  4 Experimental  Verification  of  Debugging  Models 
by  D.  L.  Baggi 


3.  4.  1 Introduction 


The  object  of  this  study  is  the  implementation  of  a so-called  driver 
program  which,  given  a program  to  test,  will  force  the  traversal  through 
all  its  possible  paths.  The  advantage  of  such  a procedure  is  obvious  for 
programs  with  several  branches  and  decision  points;  they  are  usually  de- 
bugged by  laborious  construction  of  a data  set,  which  hopefully  would  cause 
exploration  of  all  paths.  The  method  described  here  forces  exhaustive 
testing  of  all  possible  paths,  with  no  need  for  the  design  of  a data  set. 


3.  4.  2 Initial  Drivers 


The  first  effort  in  the  definition  of  the  driver  program  consisted  of  the 
implementation  of  a program  capable  indeed  of  traversing  ali  paths  of  a 
given  program.  This  program  iteratively  substitutes , in  place  of  the  condi- 
tion in  a PL/!  IF-statement,  a value  of  zero  or  one,  alternatively.  With 
this  done  concurrently  for  all  conditional  statements,  eventually  one  trav- 
erses once  all  possible  paths.  It  was  realized  (already  by  Shooman,  in  his 
paper  "Analytical  Models  for  Software  Testing")  that  the  resulting  number  of 
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paths  of  2n  for  conditional  branches,  is  merely  an  upper  bound  for  all  pos- 
sible paths,  where  the  real  number  lies  in  fact,  between  n+  1 and  2n  . 
Hence  the  above  scheme  wastes  execution  time,  which  increases  expo- 
nentially (for  example,  a program  with  fourteen  paths  may  require  8142 
runs,  of  which  8128  are  meaningless!  ).  Thus  an  algorithm  had  to  be  de- 
signed to  consider  the  possible  paths  only. 


3.  4.  3 Present  Drivers 

The  present  algorithm  scans  a PL/1  program.  It  searches  for  key- 
words such  as  IF,  THEN,  ELSE,  DO  and  END,  while  constructing  a regular 
expressions  of  zeroes  and  ones.  Resolution  of  this  regular  expression 
yields  a set  of  binary  integers,  which  represent  the  status  of  the  conditional 
expressions  during  each  of  the  runs  through  all  possible  paths.  In  fact, 
each  bit  of  such  integers  represents  the  value  of  the  next  conditional  expres- 
sions met  during  execution,  hence  forcing  a zero  or  one  - branch  according- 
ly. Since  those  integers  were  derived  from  the  very  structure  of  the  al- 
gorithm of  the  program,  thev  represent  indeed  each  path;  as  an  extra  bonus, 
their  total  number  is  the  number  of  possible  paths.  Hence  the  algorithm 
enumerates  each  path,  uniquely  describing  it  in  terms  of  its  branches,  and 
also  counts  them. 

The  algorithm  works  as  follows: 

- each  expression  is  binary;  it  contains  two  terms  separated  by 

a + sign 

- the  terms  can  be  only  1,0,  or  1 or  0 concatenated  with  a 

binary  expression 

Scanning  rules: 

a.  each  IF  opens  a left  parenthesis,  ( ; 

b.  each  THEN  corresponds  to  a 1 ; 

c.  each  ELSE  corresponds  to  a 0 ; 

d.  each  well  completed  binary  expression,  with  both  terms 
completed,  compels  closing  with  right  parentheses  at  its 
level. 

Note:  The  ELSE  clause  is  assumed,  for  convenience,  to  be  always  present. 
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3.  4.  4 Examples; 


IF  cond  THEN  stmt; 

ELSE  stmt; 
IF  cond  THEN  stmt; 

ELSE  stmt; 
IF  cond  THEN  stmt; 

ELSE  stmt; 


i (l+0)(l+0)(l+0) 

\ or  the  eight  paths 

i ' 111,011,101,001,220,010,100,000 


IF  cond  THEN 

IF 

cond 

THEN 

IF 

cond 

THEN 

stmt; 

ELSE 

stmt; 

ELSE 

IF 

cond 

THEN 

stmt; 

ELSE 

IF 

cond 

THEN 

IF 

cond 

THEN 

stmt; 

ELSE 

IF 

cond 

THEN 

stmt; 

Result: 

(l(l(l+0)+0(l+0))+0(l(l+0)+0(l+0))) 
which  gives  the  eight  paths 
111,  110,  101,  100,011,010,001,000 


IF  cond  THEN  stmt; 

ELSE  IF  cond  THEN  stmt; 

ELSE  IF  cond  THEN  stmt 
ELSE  stmt 


yields: 

(l+0(l+0(l+Q(l+0)))) 

or 

1, 01, 001, 0001, 0000 


1 


I 

I 


IF  cond  THEN  DO;  IF  cond  THEN  DO;  IF  cond  THEN  stmt; 

ELSE  stmt; 

END; 

ELSE  stmt; 

END; 


ELSE  stmt; 

| 

gives : 

(l(l(l+0)+0)+0) 

in, ’no,  io,o 

r<>1 

u * 1 


I > 


Because  of  the  recursive  nature  of  the  algorithm,  it  has  been 
implemented  in  the  language  LISP.  It  could  eventally  be  translated  in 
PL/1.  In  the  meantime,  however,  the  greatest  concern  still  lies  in  making 
sure  that  such  an  approach,  and  such  an  algorithm,  works  at  all;  hence  the 
choice  of  LISP  for  fast  implementation.  No  attention  was  given  so  far  to 
repetitive  DO-groups,  which  were  appropriately  handled  by  the  initial 
crude  algorithm.  This  is  because  extension  of  this  algorithm  to  such  loops 
is  conceptually  very  trivial,  and  it  can  be  proved  that  the  system  would 
perform  equally  well:  but,  in  the  meantime,  such  an  extension  looks  very  time 
consuming  and  would  by  no  means  add  any  contribution  to  the  theory  of  this 
debugging  model. 


3.4.4  Direction  for  Further  Work 


The  long  range  idea  is  to  eventually  come  up  with  a complete  PL /I 
package,  capable  of  exploring  all  paths  of  a program,  paying  attention  to 
IF -statements,  DO-loops,  etc.  In  the  meantime,  however,  the  attention 
is  given  to  producing  a tentative  system  capable  of  showing  that  such  a proj- 
ect is  indeed  possible.  To  this  end,  the  following  set  of  programs  is  under 
construction: 

1)  a PL/1  program  which  reads  in  the  object  program  (the  one  under 
debugging)  and  translates  it  in  LISP-compatible  notation;  this  is 
saved  on  a file 

2)  a LISP  program  which  scans  the  object  program  and  constructs  the 
regular  expressions,  with  results  saved  on  a second  file. 

3)  a PL/1  driver  program  which  reads  the  results  of  these  ex- 
pressions and  forces  execution  of  the  object  program  through  all 
its  possible  paths. 


3.  5 Automatic  Programming  Techniques 


by  E.  Lipschitz 

3.  5.  1 Introduction 

— 

Different  methods  and  techniques  for  writing  a better  software  package 
are  currently  being  sought.  The  goal  is  to  write  programs  which  require  a 
shorter  testing  and  debugging  time  to  achieve  a certain  degree  of  reliability. 

One  approach  is  automatic  programming.  The  use  of  pre-written  and 
already  tested  code  modules  reduces  the  number  of  bugs. 


> 


3.  5.  2 The  Program  "AUTO-PROGRAMMING" 

The  working  hypothesis  is  that  there  exists  a high  degree  of  com- 
monality among  commercial  applications  which  can  be  exploited  to  automate 
the  production  of  code,  once  processing  and  output  specifications  are  de- 
fined. 

"AUTO-PROGRAMMING"  is  divided  into  two  parts  --  "Flow"  and 
"Auto",  both  of  which  are  interactive  on-line  programs  that,  by  communi- 
cating with  the  user,  generate  his  programs. 


A.  The  Program  "Flow" 

"Flow"  receives  the  information  about  the  flow-chart  of  a program 
from  the  user  and  generates  it.  "Flow"  recognizes  only  four  different 
types  of  blocks,  which  are  sufficient  to  generate  any  flow-chart.  They  are: 

Type  #1;  Control  block:  A conditional  decision  block,  similar  to  the 
statement  If  ( ).  Go  to  ( ). 


Type  #2;  Functional  block.  This  block  will  perform  a complete  task 
selected  from  those  in  the  computer  library. 

Type  #3;  Stop  block.  This  block  indicates  the  end  of  a path;  i.  e.  , 

Stop  statement. 

Type  #4;  User's  code  block.  The  user  inserts  the  code  he  wants  into  this 
block.  This  feature  is  used  whenever  the  library  does  not  include 
programs  for  the  needed  task. 


Upon  completing  the  flow-chart,  control  passes  to  "Auto".  "Auto"  will 
generate  the  code  for  blocks  type  #2  and  #3,  while  the  user  will  generate  the 
code  for  blocks  type  #1  and  #4. 
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B.  The  Program  "Auto" 


A collection  of  code  modules  can  be  stored  as  a system  library.  The 
desired  program  can  be  achieved  by  concatenating  different  code  modules 
from  the  system  library  with  code  generated  by  the  user. 

The  user  will  specify  to  "Auto"  what  he  would  like  to  do,  and  "Auto" 
will  advise  him  which  methods  are  available  for  the  solution,  as  well  as 
their  characteristics.  The  user  then  will  choose  the  method  he  prefers, 
and  "Auto"  will  generate  the  needed  code. 


The  library  currently  contains  the  following  modules. 

1.  Linear  Search 

2.  Binary  Search 

3.  Interchange  Sort 

4.  Shell  Sort 

0 < x < 2 it 


5.  Sin(x) 


m , , .n  2n+l 

Sin(X>  = J0  "’"(2n+”l)! 


6.  Cos(x) 


8. 


m 

Cos  (x)  = Y 
n=0 


• 2tt < x < 2 v 


, >n  2n 

_U 


(2n) ! 


Ln(x)  = 2 l 
n=0 

m n 
Exp(x)  = f 
n=0 


2 n + 1 


9.  Arctan(x) 


for  x < 


m 


1 / 1 2n+ 1 

“ W = L ‘-y.?. 

n=0 


10.  Bessel  Function  of  the  First  Kind  and  Zero  Order 

2s  n 

i - x 
m 


ro(x)  = I 


(-f2y 


n=0  (n  ! ) 
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2v  n 


m 


i J]W  ■ l 


Cf) 


n=0  (n! ) • (n+ 1 ) 


-1 2^x)  = ~ J^x)  - JQ(x) 


12.  Modified  Bessel  Function  of  the  First  Kind,  Order s 0,  1 and  2 

- ciY 

y*>-E  M 

n=0  (nip 


/ 2\n 

i . y Cf) 

;Vx>  = 1.  — — 


m 


n=0  (n!  ) • (n+ 1 ) 


X2(X)  = " x Xl(x)  + ty30 


13.  Error  Function 


2 /-x  _t2 

erf  (x)  = — — / e dt 

jrJo 


14.  Fresnel  Integral  C(x) 


2 v (-  l)nx2n+1 
& ko  ) (2n+ 1 ) 


m 


C(x)  = J cos  ~y  t2^dt  = V 


it  ^ 4n+  1 


nY0  (2  n ! ) (4  n+ 1 ) 
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I 


15. 


Fresnel  Integral  C^x) 


i x 
,(x)  = / 


cos  t 


2ir  0 JT 


dt  = 


, m , , ,n  2n  + 1 1 2 

1 V H)  x 7 

L.  (2n!)(2n+  1 U) 


J in  n=0 


16.  Fresnel  Integral  S(x) 


. xr  . /'tt.2\  . it  v ( - 1 )nC  2 ^ 
S(x)  = J sin  ^-t  ) dt  = -t"  2j  J 

0 ^ n=0  U2n+l)!l 


Tr^n  4n+3 
x 


[ (2n+ 1 )!  ] (4n+3) 


17. 


Sine  Integral  S^(x) 


* l 


sin  t 


m 


dt 


.XX  , . .IX 

V (-M  x 


n 2 n+  1 


L=q  [ (2n+l)!](2n+l) 


* 


18.  Cosine  Integral  Cin(x) 

54L  # | .v  / 1 Vll  trfXX 

cwx)=/0li^dt=.J1i1il5fT_ 


.n  2n 


19.  Dilogarithm  f(x) 


x _ i m / i >n  < i v 

«*>  = - J-H&r  dt=  Z 

Jl  (t  1J  n"i0  n^ 


Note:  m is  so  chosen  that  the  magnitude  of  the  m term  of  the  power 

series  is  less  than  or  equal  to  10"^,  while  the  magnitude  of  the 
(m-l)**1  term  is  greater  than  10”  . 


3.  5.  3 Conclusion 


The  development  of  "AUTO-PROGRAMMING"  will  continue  to 
concentrate  mainly  on  increasing  the  size  of  the  library.  More  mathemat- 
ical programs,  as  well  as  some  utility  programs  for  data  manipulation,  will 
be  developed  in  the  near  future. 
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3.  6 Small  Scale  Tests 


by  H.  Ruston  and  M.  L,.  Shooman 


3.  6.  1 Introduction 

In  order  to  verify  the  theoretical  models  four  programs  have  been 
written  by  student  programmers  and  careful  records  were  kept  on  their 
debugging  experiences.  We  will  describe  the  assigned  programs  and  the 
error  reporting  form.  This  data  is  presently  being  reduced  and  the  result- 
ing conclusions  will  be  described  in  a following  report. 


3.  6.  2 The  Programmers 

The  programmers  were  undergraduate  students  of  sophomore -junior 
standing,  with  high  interest  and  ability  in  programming  topics.  Because  of 
this  selectivity  we  believe  their  product  to  be  likely  the  one  equivalent  to 
the  one  of  programmers  with  intermediate  experience. 

Consequently,  we  consider  the  obtained  test  data  to  be  representative 
of  normal  practice. 


3.  6.  3 The  Instructions 


The  programmers  were  made  aware  of  the  importance  of  maintaining 
careful  and  truthful  records.  They  were  also  given  the  following  specific 
instructions: 

1.  The  problems  were  to  be  analyzed  and  coded,  with  both  analyses 
and  coding  times  recorded. 

2.  The  resulting  program  was  to  be  corrected  of  just  the  syntax  errors. 
Their  number,  number  of  runs  needed  for  their  correction,  and 

run  Hmes  were  to  be  recorded.  All  print-outs  were  to  be  saved  and 
numbered. 

3.  The  programs  were  then  presented  to  us  (i.  e.  , M.  Shooman  and 

H.  Ruston).  We  planned  to  ask  the  program  author  and  other  mem- 
bers of  the  group  to  debug  each  copy  independently,  recording: 

a.  Number  of  bugs  and  types  found  in  each  debugging  shot 

b.  Analysis  time  and  computer  time  for  each  debugging  shot 

c.  History  of  removed  bugs  and  generated  bugs. 
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4.  If  a programmer  reached  a blind  alley  he  had  to  consult  with  us.  He 
could  not  ask  for  other  help,  or  abort  the  program 

5.  The  program  had  to  be  constructed  with  the  following  constraints: 

a.  To  be  structured 

b.  To  contain  no  impurities  (as  listed  in  Halstead  ) 

c.  The  main  program  to  be  the  control  structure  with  calls  to 
modules  (i.  e.  , blocks  or  procedures) 

d.  No  module  to  exceed  50  lines 


3.  6.4  The  Four  Pr-ograms 


Three  small  problems  (problems  1,2,  and  3)  and  one  medium  size 
problem  were  generated  for  the  small  scale  tests.  The  initial  write-ups 
of  the  problems  for  the  desired  four  programs  follow. 


Problem  # I 


Minimum  Salary  Payroll  Adjustment 


1.  Statement  of  the  Payroll  Adjustment 


Glen  Cove  University  which  employs  200  faculty  members  has  just 
signed  a non-faculty  union  contract.  All  salaries  of  $16,000  or 
higher  are  to  remain  unchanged.  Any  faculty  member  who  earns 
less  than  $1  6,  000  per  year  is  to  receive  a pay  raise  according  to  the 
following  formula:  He  will  receive  100  per  year  additional  for  each 
dependent  (including  himself),  plus  50  per  year  for  each  year  of 
employment.  In  no  case  may  his  new  salary  exceed  $16,000  per  year. 


The  personnel  data  on  all  faculty  members  is  stored  on  magnetic 
tape  in  the  Business  Office  and  includes  present  annual  salary,  num- 
ber of  dependents,  date  of  hire  and  other  information.  The  problem 
is  to  write  a program  which  computes  and  prints  out  the  list  of  fac- 
ulty members  along  with  their  old  and  new  salary. 


Assume  that  the  Business  Office  will  give  you  a set  of  cards  with  the 
data  for  each  person  on  one  card.  You  should  create  your  own  test 
data;  however,  final  testing  of  your  program  will  be  done  on  the 
actual  card  deck.  Assume  the  following  arrays  will  accept  the  input 
data  in  your  program: 

NAME  (200):  contains  a name  of  up  to  30  characters 
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DHIRE  (200): 


contains  a string  of  10  characters,  two  digits 
for  month,  a blank,  two  digits  for  day,  a 
blank,  and  four  digits  for  year 

PRESS  SAL  (200):  contains  5 digits  with  yearly  salary  in  rounded 
dollars 


2.  Approaches  (write  two  programs,  using  each  approach): 

(a)  Search  list  of  names  for  those  making  less  than  $16,000  per 
year,  compute  new  salary,  check  $16,  000  limit,  store  new 
salary,  print  output 

(b)  Sort  list  by  salary  from  lowest  to  highest,  stop  when  $16,000 
is  exceeded,  compute  new  salary,  check  $16,000  limit,  store 
new  salary,  print  output 

3.  Language  and  Computer:  PLAGO  on  Poly  360/65 
Problem  #2 


Finding  the  Roots  of  a Cubic  Equation 

1.  Statement  of  Problem 

3 2 

The  polynomial  equation  a ^x  + a^x  + a^x  + a^  = 0 is  to  be 

solved  for  its  three  roots.  A general  solution  is  desired  which 
will  work  for  any  finite  real  values  of  a^,a^,  a^  and  a^.  The 

values  of  a^,  a^,  aj,  and  a^  are  to  be  acquired  as  floating  (single 

precision)  input  data  at  the  beginning  of  each  run.  Write  your 
program  with  a loop  so  it  reads  any  number  of  data  cards  and 
terminates  on  last  data  card.  Make  up  your  own  test  data;  however, 
it  will  be  tested  finally  with  supplied  data  cards. 


2.  Approaches 

(a)  The  general  formula  (i.  e.  , Cardano's  formula)  for  the 
solution  of  a cubic  (see  attachment  A)  is  to  be  used  to 
compute  the  roots. 

(b)  An  iterative  solution  for  a single  real  root  is  to  be  obtained. 
Once  the  real  root  is  removed,  the  quadratic  formula  is  to 
be  used  to  solve  for  the  other  two  roots. 


3.  Language  and  Computer:  PLAGO  on  Poly  360/65 
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Problem  # 3 


Manipulation  of  a File  of  Research  Reports 
1.  Statement  of  the  Problem 

At  present  the  Reliability,  Safety,  and  Software  Engineering  Group 
at  the  Polytechnic  has  about  1000  reports,  papers,  books,  journal 
proceedings  in  its  library.  Each  item  is  entered  on  an  index  card  in 
a file  box.  It  is  anticipated  that  the  library  may  eventually  grow  to 
10,  000  items  in  the  future.  In  the  near  future,  two  punched  cards 
will  be  created  for  each  of  the  items  in  the  library  and  we  wish  to 
create  a program  to  perform  various  searches,  sorts,  and  listings. 

Assume  that  each  punched  card  contains  4 character  fields.  The 
first  field  is  30  characters  wide  and  contains  the  author(s)  name(s). 
The  second  field  is  50  characters  wide  and  contains  the  full  or  ab- 
breviated title.  (No  important  words  in  the  title  are  to  be  abbre- 
viated. ) The  second  card  contains  field  3 which  is  50  characters 
wide  and  contains  key  words  (or  abbreviated  key  words)  in  the  item. 
The  last  field  is  30  characters  wide  and  contains  the  source  (journal, 
issue,  book  publisher,  proceedings,  etc.  ) of  the  item. 

The  program  must  be  able  to  perform  the  following  tasks,  and  the 
selection  (and  possibly  sequence)  of  tasks  to  be  performed  must  be 
controlled  by  the  first  data  card  which  will  serve  as  a program  con- 
trol card.  Provide  a means  of  performing  tasks  on  the  same  run. 

(1)  Read  in  a variable  length  stack  of  item  cards  and  store 
them; 

(2)  Alphabetize  the  data  by  first  author; 

(3)  Print  out  the  list  of  items; 

(4)  Create  a list  of  key  words  (from  field  3),  eliminate 
duplicates,  alphabetize,  and  print  out; 

(5)  Search  the  author  field  for  a given  author's  name,  and 
print  out  the  list  of  items  he  has  written; 

(6)  Search  the  key  word  field  for  items  which  contain  the 
"intersection"  (AND)  of  one,  two,  or  three  inputed  key 
words ; 

(7)  Provide  the  same  search  facility  as  (6)  on  words  in  the 
title  field. 

Programmer  should  create  his  own  test  cards,  and  final 
testing  will  be  performed  by  a supplied  deck  of  item  cards. 


(8)  Approaches:  Programmer  should  provide  his  own  ap- 

proaches. 

(9)  Language  and  computer:  PLAGO  on  360/65 

Problem  #4 


Specifications  for  Ballot  Counting  Procedure 

Terms : An  election  consists  of  sets  of  ballots  to  elect  persons  for  various 

committees.  Each  set  of  ballots  is  called  a committee  election.  There  may 
be  up  to  10  committee  elections  for  an  election. 

All  ballots  for  a particular  committee  have  the  name  of  the  com- 
mittee punched  in  columns  61-80  and  contain  the  names  of  the  candidates  to 
that  committee.  There  are  up  to  25  nominees  for  each  committee. 

When  a person  votes  for  a candidate,  an  "11"  punch  (i.  e.  , a minus 
sign)  is  punched  in  the  ballot  in  the  field  consisting  of  columns  21-45  cor- 
responding to  candidates  1-25.  Such  a punch  is  a mark.  A particular  ballot 
may  have  more  than  one  mark  since  a particular  committee  may  have  more 
than  one  vacant  position  (i.  e.  , there  may  be  more  than  one  vote  allowed  to 
each  voter  on  each  ballot). 

An  election  package  consists  of  one  ballot  for  each  committee. 

Each  eligible  voter  receives  one  and  only  one  election  package  which  he 
marks  and  returns  for  counting. 


Spec ifications : Prior  to  counting  the  ballots,  the  program  must  read 
certain  preliminary  information  concerning  the  election. 

For  each  committee  election  the  program  must  be  informed  as  to: 

1.  the  name  of  the  committee 

2.  the  number  of  candidates 

3.  the  number  of  marks  permitted  (i.  e.  , if  "vote  for  three" 
then  three  marks  are  allowed). 
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A rough  flow  chart  is: 


DETERMINE  VALIDITY: 


1.  Check  election  name  (cols.  61-80).  If  name  is  found  then 


2.  Check  all  marks  (cols.  21-45)  to  see  if  all  are  or  ' 1 

3.  Check  that  no  mark  occurs  after  last  candidate. 

If  ok  then 


If  ok  then 


4.  Count  number  of  marks.  If  total  is  less  than  or  equal  to  the 
number  of  marks  permitted,  then  ballot  is  valid 

If  any  test  fails  then  ballot  is  invalid. 

PROCESS  BALLOT: 

Consists  of  tallying  the  marks  in  some  sort  of  array,  probably 
25  x 10. 


OUTPUT: 


Printing  tallies. 

Typical  output  for  3 committees  named  PPC,  SAB,  TENURE 


CANDIDATE  NUMBER 


TENURE 


THERE  WERE  50  VALID  BALLOTS 


THERE  WERE  4 INVALID  BALLOTS  AND  THESE  ARE  REPRINTED  ON 
PREVIOUS  PAGE 

If  there  are  no  invalid  ballots,  then  the  last  line  can  be  left  unprinted. 
The  sample  output  has  10  rows  of  tallies  since  SAB  has  10  candidates. 
Provision,  should  be  made  for  printing  any  number  of  rows  from  a minimum 
of  2 to  a maximum  of  25  depending  on  the  maximum  number  of  candidates 
for  any  committee. 
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3.  6.  5 The  Reporting  Form 

After  several  revisions  the  resulting  reporting  form  has  been  selected. 


POLYTECHNIC  INSTITUTE  OP  NEW  YORK 
Department  of  EE/BP 

Division  of  Computer  Science 
Safety,  Reliability  cad  Software  Engineering  Group 
RADC  Contract  F30602-74-C-0294  (Softy) 


INDIVIDUAL  ERROR  REPORTING  FORM 
(This  mat  be  completed  for  each  non-syntax  error) 


1.  Identification 

(a)  Programmer*!  Na 

(b)  Program  Title  _ 

(c)  Data  


(d)  Form  Number  (for  thin  program). 


(a)  Description  of  the  error  (be  precise) 


(f)  Description  of  the  correction  (be  precise)  . 


2.  Keans  of  Detection  -v'only  for  Corrections  (not  New  Reqe.) 

-More  than  one  category  may  be  ✓*ed 


1 I a.  Hand  Processing 

a 

d. 

Interrupt  Error  (Code 

| | b.  Personal  Comainlcatlon 

□ 

e. 

Incorrect  Oitput  or  Result 

[ j c.  Infinite  Loop 

□ 

f. 

Missing  Output 

□ g.  Other  - Explain 
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3.  Effort  to  Diagnose  the  Error  - Do  not  Include  effort  spent  In  Initial  detection 

a.  Ho.  of  Runs  to  Diagnose Elapsed  Computer  Time  (Minutes) 

b.  Working  ’"Ime  to  Diagnose  Hours 

4.  Category  of  Change 

SOFTWARE  CHANGE  REQUIRED 

Nature  of  Change 

Documentation  (Preface  or  Coamenta) 

Fix  Instruction 

Change  Constanta 

Structural 

Algorithmic 

Other  * Explain 

Source  of  Bug 

I I Bug  essentially  unrelated  to  previous  corrections  (l.e.,  usual  case  of 
( bug  Just  discovered) 

I I Previous  correction  did  not  remove  the  believed  error  (l.e..  Improper  or 
t Incomplete  analyals) 

I I New  Bug.  Introduced  by  a previous  correction  (l.e.,  bug  generation  through 
a correction). 


Misinterpretation  of  Specifications  Operating  System 

„ Wrong  Specifications  Support  Software 

Incomplete  Specifications  Card  Mia punched 

Incorrect  Sequencing  of  Computations  3—1  Other  - Explain 

_ Incorrect  Input  Data  (Type  and  Quantity) 

Incorrect  Expressions 

_ Incorrect  Declaration 

No  Defense  Against  Invalid  Data 

5*  Difficulty  of  Correction 

s.  No. of  Runs  to  Correct Elapsed  Computer  Time  (Minutes) 

b.  Working  lime  to  Debug:  Days Hours 

c.  No.  of  Cards:  Changed Added Deleted 

6.  Consents  { Vet  Reverse  Side  and  Additional  Sheets  ll  Necessary) 
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3.  6.  6 The  Present  Status  and  Planned  Activities 

Problems  1,2  and  3 have  been  debugged  by  single  programmers,  that 
is,  by  just  their  authors,  A set  of  test  data  has  been  selected  for  problem 
1,  and  the  several  versions  of  the  first  program  have  been  exercised  with 
this  test  data  successfully. 

It  is  planned  to  perform  the  additional  debugging  with  other  program- 
mers andto  use  the  test  data  for  the  experimental  small  scale  verification 
of  our  theoretical  work. 


1.  M.  Halstead,  "Software  Physics",  Basic  Principles  IBM  Research 
Report,  RJ1582  IBM  Research,  Yorktown  Heights,  N.Y.  May  1975. 


3.  7 Micro  Reliability  Models 
by  M.  L.  Shooman 


i 
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3.7.  1 Introduction 

Many  previous  software  reliability  prediction  models  by  this  author 1 
and  other s2  have  concentrated  on  the  bulk  (macro)  aspects  of  a program. 
This  work  involves  a newly  developed  micro  model3  which  is  based  on 
program  structure. 

It  is  assumed  that  the  program  has  been  written  in  structured  or 
modular  form  so  that  decomposition  into  its  constituent  parts  is  simple. 
Further,  we  assume  that  via  analysis  of  the  program  the  decomposition  can 
be  related  to  several  paths  or  other  functional  structures  within  the  pro- 
gram. 

The  model  is  constructed  based  upon  the  frequencies  with  which  each 
of  the  j paths  are  run,  (f.),  the  running  time  of  each  path,  (t.),  and  the 


probability  of  error  along  each  path,  (q^)  . 


Several  methods  of  calculating  or  measuring  the  f.,  F,  and  q^  para- 
meters are  suggested.  In  fact  it  is  possible  to  use  one  technique  (historical 
data)  to  produce  crude  estimates  at  the  start  of  the  design,  and  refine  the 
estimates  with  more  accurate  values  as  the  design  progresses.  Given  the 
existence  of  such  a model,  we  can  consider  the  application  of  three  im- 
portant design  techniques  which  are  impossible  with  a macroscopic  model: 

(1)  Apportionment  of  the  software  reliability(or  mean  time  to  failure) 
specification  among  the  subsystems  so  each  design  team  has  their 
own  goal  to  meet.  The  apportionment  is  obviously  done  so  that  the 
subsystem  reliabilities  combine  to  yield  a system  reliability  which 
meets  system  specifications. 
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(2)  If  design  proceeds  either  bottom  up  or  top  down,  eventually  there  is 
a system  integration  phase  where  all  parts  are  put  together  and  tried 
out.  The  macroscopic  models  developed  previously  could  not  be  ap- 
plied before  the  system  reached  the  integration  stage.  However,  the 
new  microscopic  model  proposed  can  be  used  to  combine  the  results 
of  the  module  development  phases  to  predict  a preliminary  software 
reliability  index  before  the  system  integration  phase. 

(3)  The  microscopic  model  is  based  upon  measurements  made  on  the 
software  design.  Such  measurements  and  analyses  performed  on  the 
software  lead  to  a more  disciplined  design  and  provide  insight  into 
how  the  module  performance  relates  to  overall  software  system  per- 
formance. 


3.7.2  Micro  Decomposition  Model 

The  micro  decomposition  model  which  will  be  proposed  in  this  sec- 
tion is  based  upon  several  assumptions.  We  first  assume  that  the 
program  has  been  designed  using  a structured  or  modular  philosophy 
and  as  a result  there  emerges  a natural  structure  of  the  program 
which  can  be  described  as  consisting  of  a number  of  paths,  cases, 
parts,  modules,  or  subprograms.  The  decomposition  focuses  about 
this  natural  structure.  In  general  we  will  primarily  use  the  term 
paths  from  now  on  to  designate  the  paths,  cases,  parts,  modules, 
subprograms,  or  any  other  important  substructure.  We  also  assume 
that  the  majority  of  the  paths  are  independent  of  each  other.  (One 
could  probably  tolerate  some  type  of  dependence  in  the  model  if  it 
were  limited.  ) 

The  decomposition  model  will  be  developed  from  the  probabilistic 
viewpoint  of  relative-frequency.  We  will  hypothesize  a sequence  of 
tests  which  either  uncover  a bug  (failure)  or  run  to  completion  with- 
out uncovering  a bug  (success).  We  begin  our  development  of  the 
model  by  defining  the  following  variables  and  parameters: 

N = The  number  of  tests 

i = The  number  of  software  paths  (cases,  parts,  modules,  etc.) 

t.  - Time  to  run  case  i (if  time  is  not  deterministic  we  can  sub- 
stitute the  mean  value  oft.,  i.  e.  , t.). 

i l 

q.  = Probability  of  error  on  each  run  of  case  i (The  probability 
1 of  no  error  p.  = 1 - q.). 

T = Frequency  with  which  case  i is  run. 

n^  - Total  number  of  failures  in  N tests. 

H = Total  cumulative  test  time  in  hours. 


i 
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Note  that  in  the  above  set  of  definitions  we  have  defined  N as  the  number  of 
tests.  Thus,  we  are  modeling  actual  or  simulated  operation  by  a succession 
of  N tests  (path  traversals)  of  the  system.  We  also  assume  the  input 
data  varies  on  each  traversal.  This  is  the  reason  why  we  have  assigned  a 
constant  as  the  probability  of  encountering  a bug  on  each  run,  q^. 

If  there  were  no  variation  in  input  parameters,  and  three  successive 
tests  each  traversed  path  j,  then  the  probability  of  encountering  an  error 
on  the  first  trial  would  be  q..  The  conditional  probability  of  encountering 
a bug  on  the  second  traversal  of  the  same  path  with  the  same  parameters 
is  unity.  Similarly,  the  probability  on  the  same  path  with  the  same  para- 
meters the  third  time  is  also  unity.  Thus,  the  probability  of  a bug  on  three 
traversals  ol  path  j is  q.  x 1 x 1 = q^. 

Since  we  have  assumed  a variation  in  parameters  on  each  run  in  our 
model  and  each  test  is  independent,  then  the  probability  of  encountering  one 
bug  on  three  successive  traversals  of  path  j is  given  by  the  binomial  distri- 
bution as  ^ 


P(1  error  in  three  trials)  q?  (^1-q^ 


(1) 


Similarly,  the  expected  number  of  occurences  in  a probabilistic  process 
governed  by  a binomial  distribution  is 


Number  of  Occurrences  = Nq 


(2) 


where  N is  the  number  of  trials  and  q the  probability  of  occurrence. 


3.  7.  3 Development  of  the  Model 

We  can  now  compute  the  total  number  of  failures  n^  in  N tests.  The 
tests  are  distributed  along  each  path  such  that  Nf  j tests  traverse  path  1, 
Nf^  tests  traverse  path  2,  etc.  Thus,  successive  application  of  Eq.  2 to 
each  of  the  i paths  yields  for  the  number  of  failures  in  N tests. 


nf  = Nflql  + Nf2q2  + Nfiqi 


(3) 


We  can  now  compute  the  system  probability  of  failure  on  any  one  test  run, 
q , by  taking  the  ratio  of  n^/N  as  N approaches  infinity 


nf  L 


q = lim  — = Y f.q. 
Mo  TV,  ^ N .Li  jm 
N —oo  j =1  J 


(4) 


Similarly  we  can  compute  the  system  failure  rate,  zq,  by  first  com- 
puting the  total  number  of  test  hours.  First  we  compute  the  total  number  of 
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traversals  of  path  i as  NT  as  was  previously  done.  Out  of  these  trav- 
ersals the  fraction  p.  will  be  successful  and  will  accumulate  Nxf.xp.  xt. 

i 111 

hours  of  successful  operation.  If  we  assume  that  the  time  to  failure  distri- 
bution for  the  NEq^  traversals  which  result  in  failure  is  rectangular,  then 

each  trial  which  results  in  failure  runs  E/2  hours  on  the  average  before 

failure.  Thus,  the  total  test  time  accumulated  in  N runs  is  given  by 


H - Nflp1t!  + Nf lq!  2 + Nf2P2t2  + Nf2q2  2 


t.  i q. 

+ Nf.p.t.  + Nf.q.  — = NY  f.t.  (p.  + ~~  ) 

i*i  l in  2 11*1  2 


(5) 


Substitution  for  p. 

*i 


1-q^  in  Eq.  (5)  and  simplification  yields 


H = N V 
J^l 


f.t. 

1 l 


) 


we  now  compute  the  system  failure 


rate  z 

o 


as 


z 

o 


lim 
M — oo 


(6) 


(7) 


and  substitution  from  Equations  (3)  and  (6)  into  (7)  yields  in  the  limit 


i 


(8) 


3.7.4  Special  Cases 

We  now  wish  to  examine  equations  (4)  and  (8)  under  special  constraints. 
These  are  listed  in  Table  1.  Note  that  the  units  of  z are  clearly  seen  from 

- 1 ° 

case  4 to  be  failures  per  hour,  or  just  hr. 

3.  7.  5 Measurement  of  Parameters 


In  order  to  implement  the  model  developed  in  the  previous  sections 
we  must  develop  numerical  values  for  the  sets  of  parameters  f.,  q.,  and  t.. 
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Of  course  in  keeping  with  the  concept  of  structured  programming  and  levels 
of  structures  within  levels,  one  could  merely  state  that  we  continue  decom- 
position to  lower  levels  until  we  end  up  with  a new  set  f'  ,,  q'.,,  and  t ' . , 

parameters  at  a lower  level.  Clearly,  the  answer  to  the  question  how  we 
could  measure  or  estimate  our  parameters  at  a higher  level  is  also,  by 
and  large,  an  answer  to  how  we  would  do  the  measurement  at  a lower 
level. 

| 

The  parameter  sets  f.  and  t.  are  related  to  the  structure,  size,  and 

J J 

complexity  of  the  control  structure  and  program  modules.  The  determina- 
tion of  the  f.  can  be  made  by  a study  of  the  physical  meaning  of  the  paths 
and  the  distributions  of  input  parameters  which  drive  one  along  the  pro- 
\ gram  paths.  If  the  program  is  complex,  or  there  is  really  no  information 

on  input  statistics,  we  can  take  one  of  two  approaches.  Assume  f.  has  a 
uniform  distribution  (see  case  2 of  Table  1)  or  insert  counters  in  J the 
various  paths,  and  experimently  determine  the  fj.  The  experimental  ap- 
proach requires  that  the  program  be  in  reasonably  good  shape  so  that  a 
simulated  test  program  can  be  run.  Clearly,  if  a counter  Cj  is  placed  in 
each  path  such  that  it  registers  one  count  for  each  path  traversal,  and  we 
run  N tests  then  f.  = c./N. 

J J 


The  set  of  t.  parameters  can  also  be  either  calculated  or  measured.  j 

If  the  program  is  'written  in  assembly,  machine  or  microprogramming  , 

code,  one  can  estimate  quite  closely  the  run  time  of  a sequence  of  code  by 
summing  the  operating  times  of  each  instruction.  If  the  program  is  complex,  ! 

one  can  write  an  analysis  program  to  read  the  code  and  perform  the  time 
analysis  to  determine  L,  In  the  case  of  a higher  level  language, 

(FORTRAN,  PL/1,  COBOL,  etc.)  the  analy  sis  is  more  complex,  because  j 

each  statement  may  expand  into  one  to  say  ten  machine  language  statements. 

Several  approaches  are  possible.  First  of  all,  one  can  obtain  a core  dump 
of  the  machine  language  program  and  proceed  as  has  been  described. 

Another  alternative  is  to  insert  a block  of  higher  level  code  inside  a DO  I = 1 
to  K loop.  The  loop  is  run  for  a particular  value  of  K and  the  C.  P.  U. 
time  of  the  computer  recorded.  The  value  of  K is  changed  and  another  run 
and  value  of  C.P.  U.  time  is  recorded.  With  about  3 values  of  C.P.  U.  time  I 

vs.  K an  accurate  enough  straight  line  or  polynomial  model  of  run  time  vs.K 
can  be  fixed  to  the  data.  * One  can  then  use  the  formula  to  predict  the  run 
time  of  the  actual  code  block  by  substituting  the  number  of  repetitions.  (True 
value  of  K.  ) Of  course  if  the  program  and  a simulation  is  available  one  can 
merely  run  several  test  runs  for  each  path,  record  the  times  and  use  aver- 
age values  for  each  path. 


It  is  necessary  to  take  several  measurements  for  two  reasons.  First  of 
of  all  there  is  program  overhead  which  may  vary  from  run  to  run  depend- 
ing on  the  operating  system.  (Also  there  is  DO  loop  overhead).  Second, 
the  recording  of  C.P.U.  time  is  not  accurate  for  short  run  times.  To 
correct  for  DO  loop  overhead  and  also  system  overhead,  one  can  perform 
the  measurement  with  and  without  the  code  block  in  the  loop  and  work 
with  the  difference. 
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The  estimation  of  the  qj  parameters  is  somewhat  more  difficult. 

During  the  early  stages  of  design  or  development  one  can  try  and  estimate 
qj  using  historical  data.  One  way  to  derive  the  qj  parameter  is  to  obtain 
failure  rate  data  on  the  program  and  equate  it  to  zQ  using  the  assumptions 
of  case  3 or  4 of  Table  1;  and  solve  for  q..  This  process  can  be  repeated 
as  the  program  is  written  and  better  valued  for  q.  determined.  There  is  a 
possibility  that  one  could  calculate  qj  from  a more  basic  procedure. 
Knuth  has  shown  that  most  FORTRAN  statements  are  relatively 

simple  and  fall  into  one  of  several  classes.  If  each  of  these  classes  also 
has  a characteristic  error  rate,  then  by  analysis  of  the  q-  values  for  sev- 
eral examples,  we  should  be  able  to  derive  characteristiC*values  for  the  q. 
parameters.  ^ 

3.  7.  6 Conclusions 


The  models  developed  above  allows  one  to  decompose  a program  into 
a number  of  modules,  paths,  modes,  or  other  functional  entities.  One  can 
then  compute  an  expression  for  the  software  failure  rate  in  terms  of 
probabilistic  and  deterministic  parameters  which  can  be  estimated  from 
historical  data  or  determined  by  analysis  or  experiment.  The  model  pro- 
vides a clear  cut  procedure  for  relating  the  reliability  of  a large  software 
system  to  the  reliability  of  its  constituent  parts.  The  model  is  presently 
being  applied  to  a number  of  modest  size  problems  in  order  to  obtain 
typical  parameter  values  and  validate  the  model. 
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$ Rome  Air  Development  Center  i 


RADC  plans  and  conducts  research,  exploratory  and  advanced 
development  programs  in  command,  control,  and  coimunications 
(C3)  activities , and  in  the  C3  areas  of  information  sciences 
and  intelligence.  The  principal  technical  mission  areas 
are  communications , electromagnetic  guidance  and  control, 
surveillance  of  ground  and  aerospace  objects,  intelligence 
data  collection  and  handling,  information  system  technology, 
ionospheric  propagation,  solid  state  sciences,  microwave 
physics  and  electronic  reliability , maintainability  and 
compatibility . 


